基于步态阶段的控制是步行AID机器人的热门研究主题,尤其是机器人下限假体。步态阶段估计是基于步态阶段控制的挑战。先前的研究使用了人类大腿角的整合或差异来估计步态阶段,但是累积的测量误差和噪声可能会影响估计结果。在本文中,提出了一种更健壮的步态相估计方法,使用各种运动模式的分段单调步态相位大角模型的统一形式。步态相仅根据大腿角度估算,这是一个稳定的变量,避免了相位漂移。基于卡尔曼滤波器的平滑液旨在进一步抑制估计步态阶段的突变。基于提出的步态相估计方法,基于步态阶段的关节角跟踪控制器是为跨股骨假体设计的。提出的步态估计方法,步态相和控制器通过在各种运动模式下的步行数据进行离线分析来评估。基于步态阶段的控制器的实时性能在经际假体的实验中得到了验证。
translated by 谷歌翻译
In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand. In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU. We formulate the problem with this structure as Shared-Resource Stochastic Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO). Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
translated by 谷歌翻译
We consider an offline reinforcement learning (RL) setting where the agent need to learn from a dataset collected by rolling out multiple behavior policies. There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies. Previous methods fail to handle this by only controlling the global trade-off. 2) For a given state, the action distribution generated by different behavior policies may have multiple modes. The BC regularizers in many previous methods are mean-seeking, resulting in policies that select out-of-distribution (OOD) actions in the middle of the modes. In this paper, we address both challenges by using adaptively weighted reverse Kullback-Leibler (KL) divergence as the BC regularizer based on the TD3 algorithm. Our method not only trades off the RL and BC signals with per-state weights (i.e., strong BC regularization on the states with narrow action coverage, and vice versa) but also avoids selecting OOD actions thanks to the mode-seeking property of reverse KL. Empirically, our algorithm can outperform existing offline RL algorithms in the MuJoCo locomotion tasks with the standard D4RL datasets as well as the mixed datasets that combine the standard datasets.
translated by 谷歌翻译
Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.
translated by 谷歌翻译
需要在最终用户设备(例如智能手机)上训练DNN模型的需求,而随着需要改善数据隐私并减少通信开销的需求。与具有功能强大CPU和GPU的数据中心服务器不同,现代智能手机由多种专门内核组成,遵循系统启动(SOC)架构,共同执行各种任务。我们观察到,在智能手机SOC上的培训DNN不仔细考虑其资源限制不仅会导致次优培训表现,而且还会显着影响用户体验。在本文中,我们展示了天鹅,这是一种神经引擎,可在不损害用户体验的情况下优化智能手机SOC的DNN培训。广泛的大规模评估表明,天鹅可以在最先进的情况下提高1.2-23.3倍的表现。
translated by 谷歌翻译
广告分配涉及将广告和有机项目分配给有限的饲料插槽,以最大化平台收入,已成为研究热点。请注意,电子商务平台通常有多个针对不同类别的入口,并且某些入口几乎没有访问。这些入口的数据覆盖范围较低,这使得代理很难学习。为了应对这一挑战,我们提出了基于相似性的ADS分配(SHTAA)的混合转移,该转移有效地将样本和知识从数据富裕的入口转移到数据贫乏的入口。具体而言,我们为MDP定义了不确定性感知的相似性,以估计不同入口的MDP的相似性。基于这种相似性,我们设计了一种混合转移方法,包括实例传输和策略传输,以有效地将样本和知识从一个入口传递到另一个入口。 Meituan食品交付平台上的离线和在线实验都表明,该建议的方法可以在数据贫困的入口方面获得更好的性能并增加平台的收入。
translated by 谷歌翻译
随着强化学习(RL)的最新流行率,在推荐平台(例如电子商务和新闻提要网站)中利用RL来利用RL进行广泛的兴趣。为了获得更好的分配,将最近基于RL的广告分配方法的输入从点单项目升级到列表项目的布置。但是,这也导致了国家行动对的高维空间,因此很难以良好的概括能力学习列表表示。这进一步阻碍了RL药物的探索,并导致样本效率差。为了解决这个问题,我们提出了一种基于RL的新方法,用于广告分配,该方法通过利用Meituan食品交付平台上的任务特定信号来学习更好的列表表示形式。具体而言,我们根据对ADS分配的先前领域知识分别提出基于重建,预测和对比度学习的三个不同的辅助任务。我们在Meituan食品输送平台上进行了广泛的实验,以评估拟议的辅助任务的有效性。离线和在线实验结果都表明,与最先进的基线相比,提出的方法可以学习更好的列表表示形式,并获得更高的平台收入。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译